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ENCODING METHOD AND ARRANGEMENT 
Field of the Invention 

This invention relates to encoding and decoding images. Espe- 
5 daily, the invention relates to encoding and decoding video in streaming me- 
dia solutions. Streaming media means that a video is transmitted through a 
network from a sending party to a receiving party in real-time when the video 
is shown on the terminal of the receiving party. 

10 Background of the Invention 

A digital video consists of a sequence of frames - there are typi- 
cally 25 frames per second - each 1 consisting of M1xN1 pixels, see Fig. 1. 
Each pixel is further represented by 24 bits in some of the standard color rep- 
resentations, such as RGB where the colors are divided into red (R), green 

15 (G), and blue (B) components that are further expressed by a number rang- 
ing between 0 and 255. A capacity of a stream of M1xN1x24x25 bits per 
second (bps) is needed for transmitting all this Information. Even a small 
frame size of 160x120 pixels yields 11,5 Mbps and is beyond the bandwidth 
of most fixed and, in particular, all wireless Internet connections (9.6kbps 

20 (GSM) to some hundreds of kbps within the reach of WLAN). However, all 
video sequences contain some amount of redundancy and may therefore be 
compressed. 

Any video signal may be compressed by dropping some of the 
frames, i.e., reducing the frame rate, and/or reducing the frame size. In color 

25 videos, a clever choice of the color representation may further reduce the 
visually relevant information to one half or below, for example the standard 
transition from RGB to YCrCb representation. YCrCb is an alternative 24 bit 
color representation obtained from RGB by a linear transformation. The Y 
component takes values between 0 and 255 corresponding to the brightness 

30 or the gray-scale value of the color The Cr and Cb components take values 
between -128 and +127 and define the chrominance or color plane. In radial 
coordinates, the angle around the origin or hue determines the actual color 
while the distance from the origin corresponds to the saturation of the color. 
In what follows, these kinds of steps are assumed taken and the emphasis is 

35 on optimal encoding of the detailed information present in the remaining 
frames. 
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All video compression techniques utilize the existing correlations 
between and within the frames, on the one hand, and the understanding of 
the limitations of the human visual system, on the other. The correlations 
such as immovable objects and areas with constant coloring, may be com- 
pressed without loss, while the omission of invisible details is by definition 
lossy. Further compression requires compromises to be made in the accu- 
racy of the details and colors in the reproduced images. 

In absence of cuts (a change of scene) in a video, the consecutive 
frames differ only if the camera and/or some of the objects in the scene have 
moved Such a series of frames can be efficiently encoded finding the direc- 
tions and magnitudes of these movements and conveying the resulting mo- 
tion information to the receiving end. This kind of procedure is called motion 
15 compensation; the general idea of referring to the previous frame Is known as 
INTER (frame) encoding. Thus an INTER frame closely resembles the previ- 
ous frame(s). Such a frame can be reconstructed with the knowledge of the 
previous frame and some amount of extra information representing the 
changes needed. To get an idea of the achievable compression ratios, let us 
consider an 8x8 block 2 (see Fig. 2 and 3). which corresponds to 
8x8x24=1536 bits in the original fomi. If the movement of the block between 
two consecutive frames 1 is limited between, e.g., -7 and 7 pixels, the two- 
dimensional motion vector can be expressed with 8 bits resulting in a com- 

pression ratio of 192. u + 

In order for this procedure to work, the first frame after each cut 
needs to be compressed as such - this is called INTRA encoding. Thus an 
INTRA frame is a video frame that is compressed as a separate image with 
no references made to any other frame. INTRA frames are needed at the be- 
ginning of a video, at cuts, and to periodically refresh the video in order to re- 

30 cover from errors. • ■ ♦ „r,^ 

Retaining good visual quality of the compressed videos is just one 
of the many requirements facing any practical video compression technology. 
For commercial purposes, the encoding process should be reasonably fast .n 
order to facilitate the encoding of large amounts of video content. Apart from 

35 a possible initial buffering of frames in the computer's memory, the viewing of 
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a video typically occurs in real time demanding real time decoding and play- 
back of the video. The range of intended platfonns from PC's to PDA's (per- 
sonal digital assistant) and possibly even to third generation mobile phones 
sets further constraints on the memory usage and processing power needs 

5 for the codecs. 

Fast decoding is even more important for the so-called streaming 
videos, which are transmitted to the receiver in real time as he or she is 
watching them. For streaming videos, a limited data transmission capacity 
detennines a minimum compression ratio over the full length of the video. 
10 This is because the bit rate for transmitting the video must remain within the 
available bandwidth at all times. 

Most video compression technologies comprise two components: 
an encoder used in compressing the videos and a decoder or player to be 
downloaded and installed in the computers of all the to-be-viewers of the vid- 
15 eos. Although this downloading needs to be done only once for each player 
version, there is a growing interest towards player-free streaming video solu- 
tions, which can reach all internet users. In such solutions, a small player ap- 
plication is transmitted to the receiving end together with the video stream. In 
order to minimize the waiting time due to this overhead infomiation. the ap- 
20 plication, i.e., the decoder, should be made extremely simple. 

For present purposes in this text it is sufficient to consider gray- 
scale frames/images (color images and different color representations are 
straight-forward generalizations of what follows). The gray-scale values of the 
pixels are denoted as the luminance Y. These form a two-dimensional array 
25 in a frame and the challenge to the encoding process is to perform the com- 
pression and decompression of this array in a way that retains as much of 
the visually relevant information in the image as possible. 

In the INTRA mode, (video or image compression technique used 
in encoding INTRA frames) each frame is just a gray-scale bitmap image. In 
30 practice the image is typically divided into blocks of NxW pixels 2 and each 
block is analysed independent of the others, see Fig. 3. 

The simplest way to compress the infonnation for an image block 
is to reduce the accuracy in which the luminance values are expressed. In- 
stead of the original 256 possible luminance values one could consider 128 
35 (the values 0,2 254) or 64 values (0.4 252) thereby reducing the num- 
ber of bits per pixel needed to express the luminance infonnation by 12.5% 
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and 25%, respectively. Simultaneously such a scalar quantization procedure 
induces encoding errors; in the previous exemplary cases the average errors 
are 0 5 and 1 luminance unit per pixel, respectively. The scalar quantization 
is very inefficient, however, since it neglects all the correlations between 
5 neighbourlng pixels and blocks that are present in any real image. 

One way to account for the con-elations between the pixels is to 
conceive the image, i.e.. the luminance values of the pixels, as a two dimen- 
sional surface. Many of the existing image compression algorithms are based 
on functional transforms in which the functional fomi of this surface is de- 
1 0 composed in terms of some set of basis functions. 

The most widely used transforms are the discrete cosine transfomn 
(DOT) and the discrete wavelet transfomn (DWT). where the basis is fomned 
by cosines and wavelets, respectively. The larger block sizes account for cor- 
relations between the pixels over longer distances; the number of basis func- 
15 tions increases as#^ at the same time. In the JPEG and MPEG standards, 
for example, the block size for the DCT coding is 8x8. The key difference be- 
tween DCT and DWT is that, in the former, the basis functions are spread 
across the whole block while, in the latter, the basis functions are also local- 
ized spatially. 

20 In the INTER mode, (An INTER mode is a video compression 

technique used in compressing INTER frames or blocks therein. INTER 
modes refer to the previous frame(s) and possibly modify them. Motion com- 
pensation techniques are representative INTER modes.) the motion compen- 
sated blocks may not quite match the originals. In many cases, the resulting 

25 error is noticeable but still so small that it is easier to convey the correction 
information to the receiving end rather than to encode the whole block anew. 
This is because the errors are typically small and they can be expressed with 
a lower number of bits than the luminance values in an actual image block. 
Apart from this distinction, the difference blocks can be encoded in a similar 

30 fashion as the image blocks themselves. 

As an alternative to the functional transfomns one can employ vec- 
tor quantization (VQ). In VQ methods, the NxN image blocks 2, or vec- 
tors 3 (see Fig. 3), are matched to vectors of the same size from a pre- 
trained codebook. For each block, the best matching code vector is chosen 

35 to represent the original image block. All the image blocks 2 are thus repre- 
sented by a finite number of code vectors 4. i.e.. the vectors are quantized. 
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The indices of the best matching vectors are sent to the decoder and the im- 
age is recovered by finding the vectors from the decoder's copy of the same 
codebook. 

The encoding quality of VQ depends on the set of training images 
used in preparing the codebook and the number of vectors in the codebook. 
The dimension of the vector space depends quadratically on the block di- 
mension N{N' pixel values) whereas the number of possible vectors groves 
as256^' - the vectors in the codebook should be representative for all these 
vectors Therefore in order to maintain a constant quality of the encoded im- 
ages while increasing the block size, the required codebook size increases 
exponentially. This fact leads to huge memory requirements and quite as im- 
portantly to excessively long search times for each vector. Several exten- 
sions of the basic VQ scheme have been proposed in order to attain good 
quality with smaller memory and/or search time requirements. 

some extensions such as the tree-search VQ only aim at shorter 
search times as compared to the codebook size. These algorithms do not 
improve the image quality (but rather deteriorate it) and are of interest here 
only due to their potential for speeding up other VQ based algorithms. 

The VQ algorithms aiming at improving the image quality typically 
use more than one specialized codebook. Depending on the details of the 
algorithm, these can be divided into two categories: they either improve the 
encoded image block iteratively, see Fig. 4. such that the encoding error of 
one stage is further encoded using another codebook thereby reducing the 
remaining error, or they first classify the image material in each block and 
then use different codebooks (411, 412, 413) for different kinds of material 
(edges, textures, smooth surfaces). The multi-stage variants are often de- 
noted as cascaded or hierarchical VQ. while the latter ones are known as 
classified VQ. The motivation behind all these is that by specializing the 
codebooks. one reduces the effective dimension of the vector space. Instead 
of representing ail imaginable image blocks, one codebook can specialize, 
for example, to the error vectors whose elements are restricted below a g^en 
value (cascaded) or blocks with an edge running through them (class,fied)^ln 
cascaded VQ variants, the vector dimension is often further reduced by de- 
creasing the block size between the stages. 
35 The key advantages in transform coding technologies are their 

analytically predictable properties and the resulting decorrelated coefficients 
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ranked in ten„s of their relative importance. These aspects enable efficient 
rate-distortion control and scalability of a stream according to an available 
transmission line bandvDidth. . , , _^.nH 

Transfom,s, such as DCT, where all the basis functions extend 
over the same block area, are more prone to blocking artefacts than DWT 
like approaches, where the spatial location and extension of the basis func- 
tion varies. This difference is evident, e.g., when encoding image blocks con- 
taming sharp edges (sharp transitions between dari< and bright regions . T^e 
DCT of such a block yields, in principle, all possible frequenaes ,n at leas 
one spatial direction. In contrast to this, the DWT of the block may lead to us 
a few nonzero coefficients. The DCT, on the other hand, is more efficentfo^ 
encrding larger smooth^ va^ing surfa^s or textures, which in turn wouk. 
require large numbers of nonzero wavelet coefficients. 

in most actual image blocks, the number of zero transfom, coeffi- 
cients is at least comparable to that of the nonzero ones. Hence the encoding 
efficiency of the transform techniques is to a large extent detemnined b *e 
efficien,^ of expressing the zeros without using and transmitting several bits 
~and every one of them. In DCT. the coef^cients a- ran^^i " 
most important and frequently occumng to the least important and rarest. 
™e zems often occur in sequences and are thus efnciently run- ength ^- 
able in DWT. the coefficients are ranked into spatially distinct hiera chies, 
where the zero coefncients often occur at once in whole branches of the hi- 
erarchy such branches can then be collectively nullified by one code word. 

All the transform coding technologies share one major drawback, 
namely their computaSonally heavy decoding ■^h^'^'"^. '~ 
inverse functional transformations and can be pertom,ed fast only on PC 
level processors. These ^uirements teave out PDA devices and moMe 
phones. Typically transform coding is also tied to specific player solu^^ons 
Lt need to be downloaded and installed before any video can be viewed. 

Another disadvantage of the transfom. codecs occurs in the con- 
text of difference encoding. The difference between the original and the en- 
oc^^frames and individual blocks depends on the methods used ,n the ini- 
«^ fncoding of the image. For transform coding methods, the remaining dif- 
treZt only due to quantization errors induced but, for motion compensa- 
-, "hemes or VQ type techniques, the difference is often rela«vely random 
alth of small magnitude. In this case, the functional transfom,ations yieM 
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arbitrary combinations of nonzero components that may be even more diffi- 
cult to compress than the coefficients of the actual image. 

The advantages and disadvantages of vector quantization tech- 
niques are quite the opposite from the transform codecs. The compression 
techniques are always asymmetric with the emphasis on an extremely light 
decoding process. In its simplest form, the decoding merely consists of table 
lookups for the code vectors. The player application can be made very small 
in size and sent at the beginning of the video stream. 

A code vector corresponds to whole NxN blocks or alternatively to 
all the transform coefficients for such a block. If one vector index is sent for 
each block, the compression ratio is bigger the larger the block size is. How- 
ever a big codebook is needed in order to obtain good quality for large N. 
This' implies longer times for both the encoding - vector search - and the 
transmission of the codebook to the receiving end. 
5 On the other hand, the smaller the blocks are the more accurate 

the encoding result becomes. Smaller blocks or vectors also require smaller 
codebboks. which require less memory and are faster to send to the receiv- 
ing end. Also the code vector search operation is faster rendering the whole 
encoding procedure faster. The disadvantage of smaller block size is the lar- 
20 ger amount of indices to be transmitted. 

In the improved VQ variants, vector space is split into parts and 
one codebook is prepared for each part. In the cascaded VQ. in particular 
the image quality is improved by an effective increase in the number of 
achievable vectors V achieved with the successive stages of encoding. In the 
25 ideal case, where the vectors in the different stages were orthogonal, adding 
a stage / with a codebook of V, vectors would increase Vto VxV, This proce- 
dure can significantly improve the image quality with reasonable total code- 
book size and search times. This improvement is done at the expense of the 
number of bits needed to encode each block; this increases by n if Vi=2". The 
30 image quality is further improved if the block size is reduced between stages. 

There are two problems with the cascaded VQ. however. Firstly, 
the codebboks are typically trained on realistic difference blocks but with no 
reference to the human visual system. Consequently, the vectors do not nec- 
essarily make the corrections, which are visually the most pleasing. Sec- 
35 ondly the number of bits needed to-encode each block grows with the num- 
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ber of stages used and even more rapidly if the block size is reduced on the 
way. 

The intention of the invention is to alleviate the above-mentioned 
drawbacks. This is achieved in a way described in the claims. 

Summary of the Invention 

The next definitions should be taken into account when reading 

this representation. 

Basic mode. Image or video compression technique designed to 
encode an image or a video frame. The temn is used as a distinc- 
tion from difference modes. 

Coding. Compression, encoding. Since compression is a basic 
action when coding in this context, the coding can be understood 
as acts for making the compression. 
Decoding. Decompression. 

Difference mode. Image or video compression technique used to 
encode the difference between two frames, usually between the 
original and encoded frames. In the latter case, the difference is 
denoted as the encoding error. 

Distortion. Measure of the encoding error. Typically Euclidian 
norm of the pixelwise differences in the original and encoded lumi- 
nance values. 
Encoding. Compression. 

The solution according to the invention combines the best proper- 
ties of several of the existing solutions. In short, it is a variant of the cascaded 
VQ with certain improvements acquired from the OCT and DNATT approaches. 
The fundamental aspects of the invention are that codebooks are pre- 
processed when training them for predetermining the frequency distribution 
of the resulting codevectors. and each block is independently coded and de- 
coded using a number of stages of difference coding needed for coding the 
particular block. The invention takes a difference block.as input and encodes 
it further in order to reduce the remaining error in an efficient manner as com- 
pared with the additional bits required. The difference block may be the result 
from any conceivable basic encoding including basic VQ encoding, motion 
compensation, DCT, and DWT. The invention significantly improves the 



the image quality in proportion to the rate (bps) used, regardless of both the 
INTER and the INTRA encoded frames. 

In accordance with the above-mentioned matters the invention 
concerns an encoding method for compressing data, in v/hich method the 
data is first encoded and difference data between the original data and the 
encoded data is formed, the difference data is divided into one or more first 
blocks, which are encoded at least at one stage, each stage comprising the 
action of the encoding and, if needed for the next stage, an action of calculat- 
ing a following difference blocks between the current difference blocks and 
the encoded current difference blocks, perfoFming the consecutive stages in 
a way that the calculated difference blocks at the previous stage are an input 
for the following stage, at each stage using a codebook. which is specific for 
the encoding of the stage, until at a final stage, final difference blocks be- 
tween the previous difference blocks and the encoded previous difference 
blocks are encoded using the last codebook. the codebooks for said differ- 
ence blocks containing codevectors trained with training difference material, 
and in that prior the training, the training difference material is preprocessed 
for individually adapting frequency distribution of each codevector for weight- 
ing to particular information of the data, and encoding each block independ- 
ently using a necessary number of the stages needed for the particular block. 

Yet the invention concems an encoder, which utilizes the inventive 
encoding method in a way that at least one codebook used for coding differ- 
ences has been weighted to a specific frequency distribution, and the en- 
coder comprises evaluation means for assigning a necessary number of the 
stages needed for the particular block. 

Furthermore taking into account the inventive encoding, the inven- 
tion concems a decoding method for decompressing data, the method com- 
prising codebooks for the decompression of encoded difforence data, 
wherein at least one of said codebooks contains codevectors. which have 
been weighted to a specific frequency distribution, and using the codebooks 
together performing a decompression result, which comprises at least the 

most significant frequencies. 

And furthermore, the invention concerns a decoder using code- 
books for the decompression of encoded difference data, wherein at least 
one of the codebooks has been weighted to a specific frequency distribution. 
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Brief Description of the Drawings 

In the following the invention is described in more detail by means 
of FIGs 1 - 10 in the attached drawings where. 

5 FIG 1 illustrates an example of a frame of size NrM1 pixels. 

FIG. 2 illustrates an example of a division of a frame into blocks of size 
N*N pixels, 

FIG 3 illustrates an example of a block of size N*N pixels, a vector rep- 
resenting the block, and a code vector for quantizing the vector, 
10 FIG. 4 illustrates an example of a known vector quantization arrange- 
ment, . , J 
FIG. 5 illustrates an example of the training of difference matenal accord- 
ing to the invention, 
FIGS 6 and 7 illustrate a simple example of the inventive way to code each 
5 block with a block specific number of coding stages, 

FIG. 8 illustrates an example of an arrangement containing evaluation 

means according to the invention, 
FIG. 9 illustrates an example of a flow chart describing the inventive 
method, and 

20 FIG. 10 illustrates an example of an arrangement for the invention. 
Detailed Description of the Invention 

FIG 4 illustrates an example of a known vector quantization ar- 
rangement. The invention significantly improves the performance of the ar- 
25 rangement, expanding the capable area of using the arrangement. It should 
be noted that if in this text a block is mentioned in the singular, although in 
reality, all blocks of images are coded/decoded, the singular fomn is used for 
helping the following of the text. 

Let us consider an original 8x8 block. At the first stage, this block 
30 is coded 41 using either one codebook 45 or alternatively several codebooks 
411 As can be noticed, classified codebooks can be used in a cascaded VQ. 
Since the coding concerns the original block, the first stage belongs to the 
basic mode The difference 416 between the original block and the coded 
block is calculated 48. The difference, i.e. the encoding error, can. for exam- 
35 pie, be measured in standard temis as the distortion 
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where d" denotes'the total distortion for an NxN block and <i,, 
the distortion of the pixel in the «h row and yth column of the block; 1^; and 
r- are the luminance values of that pixel in the original and encoded blocks. 

^pectiveV.^ distortion block is divided 414 into four 4x4 subblocks 417, 
which are encoded 42 at a second stage (the difference mode) using code- 
book A 46 or alternatively several codebooks 412. Each deference coded 4 4 
b^fe subtracted 49 from the original 4-4 difference block. The rema.,ng 
XZs 418 are then further divided 41S into four 2x2 subblock. Ea^ 
2-2 difference block 419 is encoded 43 using another codebook E 47 or a^ 
ILtively codebooks 413. Each coded 2-2 dWerence block <^ ^"^^^^ 
4lTf om the original 2-2 difference block for achieving final remaining dfle. 
enceTshould be noted that the block sizes might alternatively remain at 
each stage, in which case the divisions of the blocks are not P^'^"^ 

Each codebook is trained with realistic 'image' matenal, i.e.. at the 
difference mode with actual difference blocks occurring at the stage where 
codebook is to be used. The teaching consists o, finding a given numbe 
of vectors, which represent the training set as best as P0=^*'- ^h,s s 
achieved using the standaid k-means algorithm. The measure of 9°;*-=^ - 
the Euklidlan distances between the training vectors and the code 

vectors ctosest to them. . .„ ,h= „ciiai cas- 

This far the described procedure is equivalent to the usual «s 
ceded VQ and possesses the same virtues such as the simple decoding Jhe 
lotion consists of two modifications, thereof, that are designed to solve 

thP main weaknesses and strengthen the performance. 

the mam weak ^^^^ ^ ^^^^ ^ 

ing of the codebooks is to be pre-processed 51 for predetermining the fre- 
quency distribution of the resulting codevectois. This is done by cosme ^ns- 
IZZ all me training blocks, amoving some -'e^°n °f "-^^s by s^tt, g 
their coefficients to zero, and finally attaining the new training block via in 
ri t^nsfomiation. It should be noted that OCT is not the only way to pre- 
;o"^ning materia,, but another suSab^ functional transform can be 

used. 



The motivation behind this procedure is twofold. For one thing, rt .s 
visuaily more important to focus the limited number of bits on correctmg he 
C Uquency e 'ors than trying to correct the whole blocK con,a,n,ng all ^- 
"endes. Thl ccefficfents, representing frequencies, can be ranKe .n tenns 
5 of their Importance for the human obsewer the eye ,s more sensitive to the 
°IIr p^ frequencies than to the higher ones. This does no, necessaniy 
"ow frequencies In some absolute tenhs ^'"ce a« frequ«,»^ a« 
higher the smaller the block size Is. and hence the spread of *e unc*on ba 
21 other words, the resulting code vector (or vectors) Is adapted to a de- 

10 sired frequency distribution. 

secondly, all the code vectois In two or more codebooks Wined 
With distinct frequency regimes are at teast nearly orthogonal 
clently used together to comp^ent each omer. Th,s 
number of posslbte code vector, achteved wi«, ^ '^"^'^ 
15 encoding and two or more stages of difference encoding. The restrict»n ot 
vectors to a limited number of DCT frequencies effectively reduces 
*e ^ctor dimension. For this reason, a codebook of a given size matches 
Z W^ing vectors better than if no frequency selection has been done. Th^ 
Tct^ds to still more effective encoding of the visually Important compo- 
90 nents in the difference blocks. 

20 nents ^^^^^^ ^^^^^^.^^^ ^.^ applications in- 

clude- blocks with just the lowest frequencies, blocks with zero mean value 
and bl " s wKh thi lntem,edlate frequencies (higher than in the first case but 
no, the highest ones). After the preprocessing, the actual t-a,n«,g « per^ 

25 foil 52 from which the best matching code veCors 53 are fbund, and 

^'°°'\Z "odHica«on to the standard cascaded VQ concerns the 
spatial adaptability of th. deference encoding. In *e spirit of DWT the usage 
of further difference modes Is decided separately for each block. - - 
30 codlna of one block may involve several successive stages of ditfe ence en 
' Tdlng While Its neighbouring block Is deckled .o be encoded well enough 
With the 3 inventive way to 

code each block with a block specific number of coding stages, F^^ 6 
35 ^o^s an 8-8 block ORG which Is coded (compare FIG. 41, and the «^ 
ence between the original and the coded block Is divided (FIG. 4, 417) into 
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4*4 blocks D1A to D1D at the first stage. After this, each block is examined 
for the need of a further stage of coding. Since the original 8*8 block illus- 
trates a line 61 across a uniform background, the coding of the first stage is 
sufficient for block D1A wherein there only exists the background. The other 
blocks D1B to DID need further coding according to the examination. 

FIG. 7 shows a division of the coded 4*4 difference blocks (FIG. 
4, 415) into 2*2 blocks D22A-D22D. D23A-D23D. and D24A-D24D at the 
second stage. After the division, each block is examined for the need of a fur- 
ther stage of coding. Since blocks D22A. D22B. D22C. D23A. D23B. D23C. 
D24B, D24C, and D24C illustrate only a minor part of the line 61 across the 
uniform background or purely the background, the coding of the second 
stage is sufficient for these blocks. The other blocks D22D, D24A, and D23D 
need further a third stage of coding. As a result of coding the original 8*8 
block, one 4*4 block, i.e. block D1A, has been coded using one stage, sev- 
eral 2*2 blocks (blocks D22A, D22B, D22C, D23A, D23B. D23C. D24B, 
D24C, and D24C) have been coded using two stages, and three 2*2 blocks 
(D22D, D24A, and D23D) have been coded using three stages. 

The decision for using additional stages of coding is based on 
rate-distortion considerations in the form of some kind of cost function involv- 
ing the relative cost for using further bits while achieving some reduction in 
the block's distortion. In other words, if the cost of using additional stage is 
too much, the additional stage(s) is unnecessary. The cost function may be 
weighted in a desired way. i.e. weighting the cost of the bits used in propor- 
tion to distortion. Furthermore, and how it should be understood in a large 
view, the weighting takes account the weighted use of bits per a distortion 
value (such as a distortion value of luminance or chrominance components). 
The use of bits may be weighted linearly or nonlinearly over the range of dis- 
tortion values. 

The advantage of this procedure is the increased flexibility of the 
bit allocation across each frame. Consequently, the difficult regions can be 
encoded with a succession of difference modes and code vectors while sim- 
pler regions can be corrected once or left as they are. This flexibility in- 
creases the usage of the difference stages for any given bit rate. 

Due to the abovementioned matters the inventive arrangement 
needs evaluation means for examining the need of using additional coding 
stages. As FIG. 8 shows, the evaluation means 102 can preferably be im- 
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plemented into the division modules (compare FIG. 4. 414, 415. and 410) 
used 101. but the evaluation means can be an individual module. 

Preferable implementation of the Invention 

The inventive arrangement takes a difference block as input at 
each difference mode stage and encodes it further in order to reduce the re- 
maining error in an efficient manner as compared with the additional bits re- 
quired. The difference block may be the result from any prior encoding such 
as basic VQ encoding, motion compensation. DCT. or DWT. 

The inventive' solution consists of two parts: the training of the 
codebooks and algorithm for utilizing them m video encoding. Let us, for ex- 
ample, consider a frame from a gray-scale video, which has been encoded 
with some combination of VQ and motion compensation using 8x8 block size. 
The resulting difference image is divided into 4x4 blocks, which are to be en- 
coded in two further stages. 

The training of the first difference codebook. codebook A. has 
been performed with realistic difference material, but with the lowest fre- 
quency i e , the constant component removed. The standard k-means algo- 
rithm tends to emphasize the lower frequencies, but cannot generate ficti- 
tious finite averages to the resulting vectors. For a codebook with 256 vec- 
tors, the frequencies are concentrated to the lower half of the frequency ta- 
ble 

The second stage codebook. codebook B. is trained with differ- 
ence blocks where, e.g.. one third of the lowest frequencies have been re- 
moved The resulting code vectors do Have some weight in these frequencies 
due to the training algorithm but the emphasis is on the higher frequencies. 
Therefore the code vectors fomi codebooks A and B can efficiently comple- 
ment each other. The fact that there is some overlap between the codebooks 
can be utilized by combining two vectors from A or two vectors from B or one 
from each. The overlap can be avoided by perfomning the training with the 
transform coefficients before the inverse transformation. 

The actual encoding proceeds by first searching for the best 
matching vector from codebook A for each 4x4 block. Then the blockwise re- 
ductions in the distortion are calculated and the induced rate-distortion cost is 
compared with the cost without using the difference vectors. A typical cost 
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function \sC = d + ;ib, where d is the distortion, is a weighting factor, and b 
the number of bits used for the block. It should be noted that the weighting 
factor can also be attached to d. or the weighting can be handled using sepa- 
rate weighing factors attached to d and b. Code vectors are chosen only for 
5 those blocks for which this reduces the cost. In the next step, best matching 
code vectors in codebook B are searched for the remaining 4x4 difference 
blocks. Again code vectors are chosen only when It is cost efficient. The posi- 
tions for the code vectors can be expressed by single bits so that one byte Is 
enough to determine which subblocks of the original 8x8 block are corrected 
1 0 with vectors from codebook A and which from codebook B. 

Finally, the code vectors are centered around zero and have pre- 
dominantly very small values. Such codebooks can be efficiently compressed 
before transmitting them to the receiving end. thereby reducing the initial 
waiting time for the video recipient. 
15 FIG. 9 illustrates an example of a flow chart describing the inven- 

tive method. First step 81 is to pre-process training material for predetermin- 
ing frequency distribution of codevectors to be trained. Although the pre- 
processing is made beforehand, it is an essential step for achieving the de- 
sired perfomiance of any arrangement according to the invention. The next 
20 step 82 is to train codevectors using the pre-processed training matenal. 
Codebooks are formed. Finally, information is coded/decoded 83 using a 
cascaded VQ in a way that a necessary number of stages of coding or de- 
coding is used individually for each original block. 

FIG. 1 0 illustrates an example of an arrangement for the invention. 
25 In practical usage, the invention Is embedded as a part of complete video 
compression/decompression software. The compression, i.e. coding, . soft- 
ware 91 is nomially situated in a sending temiinal 93. The software nomially 
consists of a user interface; media readers for reading in the video and audio 
Infomiation; some fomi of basic encoding; the difference encoding algorithms 
30 and codebooks proposed in this invention; some solution for sending the 
stream- and a small decoding software package 92 to be transmitted in the 
beginning of the video stream to a receiving terminal 94. However, alterna- 
tively, the decoding software may be permanently situated in the receiving 

terminal u • ♦ 

35 The invention combines the best properties of several of the exist- 

ing solutions. In short, it is a variant of the cascaded VQ with certain Im- 
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provements acquired from the DCT and DWT approaches. It should be noted 
that the encoding of original infomiation can be made using any encoding 
technique, such as VQ. motion compensation, or some functional transform, 
and difference infomiation is handled using VQ. The only drawback of the 
invention is in the possibly slow codebook search, which has to be performed 
once for each block and each codebook. However, this may be solved by us- 
ing any of a number of fast-search algorithms, such as the tree-search VQ. 
developed for this purpose. 

Although the inventive encoding is mostly described in this con- 
text, it is clear that the invention also concerns decoding. When decoding, the 
codebooks used must contain codevectors, which are weighted for certain 
frequency distribution. Using these codebooks together, a decompression 
result obtains at least the most significant frequencies. There also exist many 
altemative forms and adaptations for the invention. For example, any form of 
'basic' encoding of intra and inter frames (i.e. blockwise or non-blockwise), 
functional transform or vector quantization, can be an underlying technique 
for the inventive arrangement, since they all leave a residual or difference be- 
tween the original images and the encoded/decoded ones. The invention 
may also be used as one step in a sequence of difference encoding with a 
possibly varying block size in each step. In other words, in each sequence 
(stage) the difference block may be processed, for example using DCT. be- 
fore coding the difference block. That is to say a pre-encoding before an ac- 
tual coding. The difference can be encoded blockwise with any block size. A 
vector library for the difference vectors may be trained in any basis, i.e., as 
image blocks or functional transfomis thereof. Codebook(s) may also be 
adaptively modified during the encoding process. The encoding procedure 
and ideas presented herein are applicable to any color presentation such as 
RGB, YUV. YCrCb, CieLAB, etc. 

To conclude in the light of the above demands, there is a need for 
a video compression technology, which achieves high compression ratios 
while retaining good perceptual image quality and whose decoding side re- 
quires only minimal processing power. It is also evident that the invention can 
be implemented in many solutions in the scope of the inventive ideas. 
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Claims u ^ *u 

1 An encoding method for compressing data, in which method the 
data is first encoded and difference data between the original data and the 
encoded data is formed, the difference data is divided into one or more first 
blocks, which are encoded at least at one stage, each stage compnsmg the 
action of the encoding and, if needed for the next stage, an action of calculat- 
ing a following difference blocks between the current difference blocks and 
the encoded current difference blocks, perfonning the consecutive stages in 
a way that the calculated difference blocks at the previous stage are an input 
for the following stage, at each stage using a codebook. which is specrfic for 
the encoding of the stage, until at a final stage, final difference Wocks be- 
tween the previous difference blocks and the encoded previous difference 
blocks are encoded using the last codebook. the codebooks for sa.d differ- 
ence blocks containing codevectors trained with training difference matenal. 
c h a r a c t e r i z e d in that prior the training, the training difference ma- 
terial is preprocessed for individually adapting frequency distribution of each 
codevector for weighting to particular information of the data, and encoding 
each block independently using a necessary number of the stages needed 

for the particular block. 

2 A method according to claim 1, c h a r a c t e r i z e d in that 
at least at one of said stages the difference blocks are divided into sub- 
blocks for being used as difference blocks at the next stage. ^ . „ , 

3. A method according to claim 1, c h a r a c t e r i z e d in that 
at least at one of said stages more than one codebook is used. 

4. A method according to claim 2. c h a r a c t e r i z e d in that 
at least at one of said stages more than one codebook is used. 

5 A method according to claim 1. characterized in that 
the preprocessing of the training material is made using discrete cosine 

transform. . . 

6 A method according to claim 1. c h a r a c t e r i z e d in that 

the preprocessing of the training material is made using any functional trans- 

7 A method according to claim 1 , c h a r a c t e r i z e d in that 
the necessary number of stages is achieved using a cost fun^ion. in a way 
that if the cost of using the additional stage is too much, the additional stage 
is unnecessary. 
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8. A method according to claim 7. c h a r a c t e r i z e d in that 
the cost function takes into account a remaining difference, and the number 
of bits used, representing the cost of the stages, for coding the block in ques- 
tion. J • tu ♦ 

9. A method according to claim 8. c h a r a c t e r i z e d in that 

the number of bits is weighted. 

10. A method according to claim 2. c h a r a c t e r i z e d in 
that the preprocessing of the training material is made using discrete cosine 

transform. . 

11. A method according to claim 2. c h a r a c t e r i z e d in 

that the preprocessing of the training material is made using any functional 

transform. . 

12. A method according to claim 2, characterized m 
that the necessary number of the stages is achieved using a cost function, in 
a way that if the cost of using the additional stage is too much, the additional 

stage is unnecessary. 

13. A method according to claim 12, c h a r a c t e r i z e d in 
that the cost function takes into account a remaining difference, and the 
number of bits used, representing the cost of the stages, for coding the block 

in question. • « ^ :„ 

14. A method according to claim 13. c h a r a c t e r i z e d in 

that the number of bits is weighted. 

15 A method according to claim 3. characterized in 
that the preprocessing of the training material is made using discrete cosine 

transform. • h in 

16 A method according to claim 3. characterized in 

that the preprocessing of the training material is made using any functional 

transform. . ■ a -.r. 

17 A method according to claim 3, characterized in 

that the necessary number of the stages is achieved using a cost function, in 
a way that if the cost of using the additional stage is too much, the additional 

stage is unnecessary. 

18. A method according to claim 17, c h a r a c t e r i z e d in 
that the cost function takes into account a remaining difference, and the 
number of bits used, representing the cost of the stages, for coding the block 
in question. 



19 



19. A method according to claim 18, c h a r a c t e r i z e d in 
that the number of bits is weighted. 

20. A method according to claim .4, c h a r a c t e r i z e d in 
that the preprocessing of the training material is made using discrete cosine 

5 transform. 

21 . A method according to claim 4. characterized in 
that the preprocessing of the training material is made using any functional 
transfomri. 

22. A method according to claim 4, characterized in 
0 that the necessary number of stages is achieved using a cost function, in a 

way that if the cost of using the additional stage is too much, the additional 

stage is unnecessary. 

23. A method according to claim 22, c h a r a c t e r i z e d in 
that the cost function takes into account a remaining difference, and the 

5 number of bits used, representing the cost of the stages, for coding the block 
in question. 

24. A method according to claim 23, c h a r a c t e r i z e d in 
that the number of bits is weighted. 

25. A method according to claim 1. characterized in 
20 that at least at one stage the difference blocks are processed before encod- 
ing. . . 

26. A method according to claim 2, characterized in 

that at least at one stage the difference blocks are processed before encod- 
ing. 

25 27. A decoding method for decompressing data, the method com- 

prising codebooks for the decompression of encoded difference data, 
c h a r a c t e r i z e d in that at least one of said codebooks contains 
codevectors. which have been weighted to a specific frequency distribution, 
and using the codebooks together performing a decompression result, which 

30 comprises at least the most significant frequencies. 

28. An encoder for compressing data, wherein the data is first en- 
coded and difference data between the original data arid the encoded data is 
formed, the difference data is divided into one or more first blocks, which are 
encoded at least at one stage, each stage comprising the action of the en- 

35 coding and. if needed for the next stage, an action of calculating a following 
difference blocks between the current difference blocks and the encoded cur- 
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rent difference blocks, performing the consecutive stages in a way that the 
calculated difference blocks at the previous stage are an input for the follow- 
ing stage at each stage using a codebook. which is specific for the encoding 
of the stage until at a final stage, final difference blocks between previous 
difference blocks and the encoded previous difference blocks are calculated 
and encoded using a last codebook. the codebooks for said difference blocks 
containing codevectors trained with training difference matenal, c h a r - 
a c t e r i z e d in that at least one codebook used for coding differences 
has been weighted to a specific frequency distribution, and the encoder com- 
prises evaluation means for assigning a necessary number of the stages 
needed for the particular block. 

29 An encoder according to claim 28, c h a r a c t e r i z e d in 
that at least at one stage the difference blocks are divided into sub-blocks for 
being used as difference blocks at the next stage. 

30. A method according to claim 28, c h a r a c t e ri z e d in 
that at least at one of said stages more than one codebooks is used. 

31. A method according to claim 29. c h a r a c t e r i z e d in 
that at least at one of said stages more than one codebooks is used. 

32 An encoder according to claim 28, c h a r a c t e r i z e d in 
20 that the evaluation means further comprises a cost function, which calculates 

the cost of using the additional stage. 

33 An encoder according to claim 32. c h a r a c t e r i z e d in 
that the cost function takes into account a remaining difference, and the 
number of bits used, representing the cost of the stages, for coding the block 

25 in question. ^ ■ ^ m \n 

34. An encoder according to claim 33. c h a r a c t e r i z e d in 

that the number of bits is weighted. 

35 An encoder according to claim 29, c h a r a c t e r i z e d in 
that the evaluation means further comprises a cost function, which calculates 

30 the cost of using the additional stage. 

36 An encoder according to claim 35, c h a r a c t e r i z e a in 
that the cost function takes into account a remaining difference, and the 
number of bits used, representing the cost of the stages, for coding the block 

in question. ^ m \n 

37. An encoder according to claim 36, c h a r a c t e r i z e a in 
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that the number of bits is weighted. 
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38. An encoder according to clainn 30, c h a r a c t e r i z e d in 
that the evaluation means further comprises a cost function, which calculates 
the cost of using the additional stage. 

39. An encoder according to claim 38, c h a r a c t e r i z e d in 
that the cost function takes into account a remaining difference, and the 
number of bits used, representing the cost of the stages, for coding the block 
in question. 

40. An encoder according to claim 39, c h a r a c t e r i z e d in 
that the number of bits is weighted . 

41. An encoder according to claim 31, characterized in 
that the evaluation means further comprises a cost function, which calculates 
the cost of using the additional stage. 

42. An encoder according to claim 41. characterized in 
that the cost function takes into account a remaining difference, and the 
number of bits used, representing the cost of the stages, for coding the block 
in question. 

43. An encoder according to claim 42, c h a r a c t e r i z e d in 
that the number of bits is weighted. 

44. A decoder using codebooks for the decompression of encoded 
difference data, c h a r a c t e r i z e d in that at least one of the code- 
books has been weighted to a specific frequency distribution. 
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(57) Abstract 

This invention relates to encoding and decoding images. 
The invention is a variant of the cascaded VQ with certain 
improvements acquired from the DCT and DWT ap- 
proaches. The fundamental aspects of the invention are 
that codebooks are pre-processed when training them for 
predetermining the frequency distribution of the resulting 
codevectors, and each block is independently coded and 
decoded using a variable number of stages of difference 
coding needed for coding the particular block. 
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